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CLAIMS 

1. A mefhod for discovering knowledge from text documents, the method 
comprising the steps of: 

extracting from text documents semi-stmctured meta-data, wherein the semi- 
5 stmctured meta-data includes a plurality of entities and a pliurality of relations 
between the entities; 

identifying from the semi-structured meta-data a plurality of key entities and a 
corresponding plurality of key relations; 

deriving from a domain knowledge base a plurality of attributes relating to 
10 each of the plurality of entities relating to one of the plurality of key entities for 
forming a plurality of pairs of key entity and a plurality of attributes related thereto; 

formulating a plurality of pattems, each of the plurality of patterns relating to 
one of the plurality of pairs of key entity and a plurality of attributes related thereto; 
analyzing the plurality of pattems using an associative discoverer; and 
15 interpreting the output of the associative discoverer for discovering 

knowledge. 

2. The method as in claim 1, wherein the step of extracting from text documents 
comprises the step of extracting text content from documents containing at least one 

20 type of text, image, audio, and video information. 

3. The method as in claim 1, wherein flie step of identifying the plurality of key 
entities comprises the step of selecting the plurality of key entities according to at 
least one of frequency of appearance of the plurality of key entities in the semi- 

25 structured meta-data and obtaining user specification. 

4. The method as in claim 1, wherein the step of identifying the plurality of key 
relations comprises the step of selecting the plurality of key relations according to at 
least one of frequency of appearance of the plurality of key relations in the semi- 

30 structured meta-data and obtaining user specification. 
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5. The method as in claim 1, wherein the step of deriving firom the 
domain knowledge base comprises the step of deriving from a domain knowledge 
base relating to at least one of taxonomy, a concq)t hierarchy network, ontology, a 
thesaurus, a relational database, and an object-oriented database. 

5 

6. The method as in claim 1, wherein the step of deriving the plurality of 
attribute comprises the step of deriving a set of attributes or lower level entities 
characterizmg the plurality of entities relating to the plurality of key entities. 

10 7. The method as in claim 1, wherein step of the formulating the plurality of 
pattems comprises the step of formulating concatenated vector representations of the 
plurality of attributes and the plurality of key entities relating to the corresponding 
plurality of key relations. 

15 8. The method as in claim 1, wherein the step of analyzing the plurality of 
pattems using the associative discoverer comprises the step of analyzing the plurality 
of pattems using at least one of a neural network, a statistical system, and a symbolic 
'machine learning system. 

20 9. The method as in claim 8, wherein the step of analyzing the plurality of 
pattems comprises the step of analyzing the plurality of pattems using an Adaptive 
Resonance Associative Map. 

10. The method as in claim 1, wherein the step of interpreting the output of the 
25 associative discoverer for discovering knowledge comprises the step of discovering 

the relations between the plurality of attributes and the plurality of key entities. 

1 1 . The method as in claim 1 , further comprising the step of using a user interface 
for displaying the semi-structured meta-data, the plurality of key entities, the plurality 

30 of key relations, the plurality of attributes, and the knowledge discovered. 
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12. The mefhod as in claim 1, further comprising the step of using a user 
interface for obtaining user instruction for Ihe plurality of key entities and the 
plurality of key relations. 

5 13. A computer program product comprising a computer usable mediimi having 
computer readable program code means embodied in the medium for discovering 
knowledge from text documents, the computer program product comprising: 

computer readable program code means for extracting from text documents 
semi-structured meta-data, wherein the semi-stmctured meta-data includes a plmrality 
10 of entities and a plurality of relations between the entities; 

computer readable program code means for identifying from the semi- 
structured meta-data a plurality of key entities and a corresponding plurality of key 
relations; 

computer readable program code means for deriving from a domain 
15 knowledge base a plurality of attributes relating to each of the plurality of entities 
relating to one of the plumlity of key entities for forming a plurality of pairs of key 
entity and a plurality of attributes related thereto; 

computer readable program code means for formulating a plurality of patterns, 
each of the plurality of pattems relating to one of the plurality of pairs of key entity 
20 and a plurality of attributes related thereto; 

computer readable program code means for analyzing the plurality of pattems 
using an associative discoverer; and 

computer readable program code means for interpreting the ou^ut of the 
associative discoverer for discovering knowledge. 

25 

14. The computer program product as in claim 13, wherein the computer readable 
program code means for extracting from text documents comprises computer readable 
program code means for extracting text content from documents containing at least 
one of text, image, audio, and video information. 

30 

15. The computer program product as in claim 13, wherein the computer readable 
program code means for identifying the plurality of key entities comprises con^puter 
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readable program code means for selecting the plurality of key entities 
according to at least one of frequency of appearance of the plvurality of key entities in 
the semi-structured meta-data and obtaining user specification. 

5 16. The computer program product as in claim 13, wherein the computer readable 
program code means for identifying the plurality of key relations comprises computer 
readable program code means for selecting the plurality of key relations according to 
at least one of frequency of appearance of the plurality of key relations in the semi- 
structured meta-data and obtaining user specification. 

10 

17. The computer program product as in claim 13, wherein the computer readable 
program code means for deriving from the domain knowledge base comprises 
computer readable program code means for deriving from a domain knowledge base 
relating to at least one of taxonomy, a concept hierarchy network, ontology, a 

15 thesaurus, a relational database, and an object-oriented database. 

18. The computer program product as in claim 13, wherein the computer readable 
program code means for deriving the plurality of attributes comprises computer 
readable program code means for deriving a set of attributes or lower level entities 

20 characterizing the plurality of entities relating to the plurality of key entities. 

19. The computer program product as in claim 13, wherein the computer readable 
program code means for formulating the plurality of patterns comprises computer 
readable program code means for formulating concatenated vector representations of 

25 the plurality of attributes and the pliirality of key entities relating to the corresponding 
plurality of key relations. 

20. The conq>uter program product as in claim 13, wherein the computer readable 
program code means for analyzing the plurality- of patterns using the associative 

30 discoverer comprises computer readable program code means for analyzing the 
plurality of patterns using at least one of a neural network, a statistical system, and a 
symbolic machine learning system. 
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21. The computer program product as in claim 20, wherein the computer readable 
program code means for analyzing the plurality of patterns comprises computer 
readable program code means for analyzing the plurality of patterns using an 

5 Adaptive Resonance Associative Map. 

22. The computer program product as in claim 13, wherein the computer readable 
program code means for interpreting the output of the associative discoverer for 
discovering knowledge comprises computer readable program code means for 

10 discovering the relations between the plumlity of attributes and tiae plurality of key 
entities. 

23. The computer program product as in claim 13, further comprising computer 
readable program code means for using a user interface for displaying the semi- 

15 stmctured meta-data, the plurality of key entities, the plurality of key relations, the 
plurality of attributes, and the knowledge discovered. 

24. The computer program product as in claim 13, further comprising computer 
readable program code means for using a user interface for obtaining user instruction 

20 for the plurality of key entities and the plurality of key relations. • 

25. A system for knowledge discovery from free-text documents, comprising: 
means for extracting semi-structured meta-data from the free-text documents; 
means for identifying key entities and key relations from the semi-structured 

25 meta-data; 

a knowledge base that defines the attributes of entities; 

means for formulatuig patterns based on the key entities and the attributes of 
entities related to the key entities; and 

means for analyzmg the patterns for knowledge. 

30 

26. The system according to claim 25 wherein the semi-structured meta-data 
comprises definition of entities and relations among the entities. 
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27. The system according to claim 25 wherein the semi-structured meta-data is 
stored in a permanent or temporary storage. 

5 28. The system according to claim 25 wherein the free-text documents comprise 
text, image, audio, video, or any combination tiiereof. 

29. The system according to claim 25 wherein the means for identifying key 
• entities selects entities according to at least one of the key entities' frequency of 

10 appearance in the semi-stmctured meta-data and user's specification. 

30. The system according to claim 25 wherein the means for identifying key 
relations selects relations according to at least one of the key relations' frequency of 
appearance in the semi-structured meta-data and user's specification. 

15 

31. The system according to claim 25 wherein the knowledge base comprises a 
taxonomy, a concept hierarchy network, an ontology, a thesaurus, a relational 
database, an object-oriented database, or any combination thereof. 

20 32. The system according to claim 25 wherein the attributes of entities comprise a 
set of attributes or lower level entities characterizing the entities. 

33. The system according to claim 25 wherein flie training examples comprises 
concatenated vectors of the key entities, and the attributes of entities related to the key 

• 25 entities with a key relation. 

34. The system according to claim 25 wherein Ifae pattem analyzer comprises a 
neural network, a statistical system, a symbolic machine learning system, or any 
combination thereof. 



30 



35. The system according to claim 25 wherein ttie pattem analyzer comprises an 
Adaptive Resonance Associative Map. 
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' 36. The system according to claim 25 wherein the knowledge conoqprises hidden 
key relations between fhe attributes of the entities and the key entities. 

5 37. The system according to claim 25 wherein the knowledge discovery system 
further comprises a user interface for displaying the semi-structured meta-data, the 
key entities, the key relations, the attributes, and the knowledge discovered. 

38. The system according to claim 25 wherein the knowledge discovery system 
10 further comprises a user interface for obtaining user's instruction for the key entities 
and the key relations. 



